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Abstract 

In this paper, distributed maximum likelihood estimation (MLE) with quantized data is considered 
under the assumption that the structure of the joint probability density function (pdf) is known, but it 
contains unknown deterministic parameters. The parameters may include different vector parameters 
corresponding to marginal pdfs and parameters that describe dependence of observations across sensors. 
We first discuss the regularity conditions which should be satisfied by the pdf and vector quantizers 
such that the MLE with quantized data is asymptotically efficient. Then, the relationship between the 
asymptotic variance of the MLE and the number of quantization bits is analytically derived. Since the 
optimal MLE scheme based on quantized data cannot be obtained when the joint pdf of observations is 
not completely known, a robust distributed MLE scheme is designed for a fixed number of quantization 
bits. Its asymptotic efficiency is proved under some regularity conditions and the asymptotic variance 
is derived so that the robustness can be analytically assessed. A numerical example with a bivariate 
Gaussian pdf is considered. Simulations show that the robustness of the proposed MLE scheme. 

keywords: Maximum likelihood estimation; distributed estimation; quantized data; asymptotic variance; 
Fisher information matrix; wireless sensor networks 



1 Introduction 

Wireless sensor networks have attracted much attention and have become a fast-growing research area over 
the past years. Many advances have been made in distributed detection, estimation, tracking and control 
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(see e.g., (H |2l [3l |H [5l |6) and references therein). Due to limited communication bandwidth and energy, the 
sensors usually quantize the measurements and send them to a fusion center which makes the final inference 
for decision, estimation, tracking or control tasks. The focus of this paper is on robust distributed MLE 
with quantized data. Distributed estimation and quantization problems have been considered in a number of 
studies. The parameters to be estimated are modeled as random and deterministic in different situations. 

For random parameters, there exist various prior studies under the assumption of known joint pdf of pa- 
rameters and sensor measurements (see |7l[3|9j[l0l[TT][T2]]). The work in [9 ] presented necessary conditions 
for optimal quantizers and optimal estimation. Optimum quantizers are difficult to obtain because coupled 
nonlinear equations need to be solved. The optimum estimator takes the form of a conditional mean that is 
usually hard to compute. To simplify computation, an efficient vector quantization algorithm based on the 
best linear unbiased estimator was proposed in iflOlfTTll . The convergence of the algorithm in [ 1 1] is guaran- 
teed. The authors of lPT2l proposed a quantization approach where only a training sequence is available. 

For deterministic parameters, several universal distributed estimation schemes have been proposed lfT3l 
[141 CGI hi the presence of unknown, additive sensor noises that are bounded and identically distributed. These 
universal distributed estimation schemes have a low bandwidth requirement. The work in |fl6l[T7l[T8l[T9"ll20l 
|2T1 addressed various design and implementation issues under the assumption of a scalar parameter and us- 
ing scalar quantizers. An approach consisting of multiple non-identical thresholds is employed in lfT6l[T7l . 
The authors of ifTTll studied estimation of a scalar mean location parameter in the presence of zero-mean 
additive white Gaussian noise. Methods to design quantizers by minimizing the worst case performance 
in bounded parameter sets were proposed Ifl8l . In |fl9l , the performance limit that a distributed estimation 
scheme with identical quantizers can achieve was found as well as the set of optimal noise distribution func- 
tions and quantizers. In |j2"0ll2Tl . a quantization approach that adaptively adjusts the thresholds from sensor to 
sensor was proposed. The work of E2l|23ll241 proposed vector quantization design for distributed estimation 
under the assumption of additive observation noise model. The authors of |[22l proposed a hyperplane-based 
heuristic approach, where the vector quantization problem can be converted to scalar quantization problems. 
In ll24l . a class of hyperplane-based vector quantizers was proposed which linearly convert the observation 
vector into a scalar by using a compression vector and then carry out scalar quantization. 

When the structure of pdf is known, in previous works, the MLE with quantized data is extensively used 
to estimate the deterministic parameters. In this paper, robust distributed MLE with quantized data is consid- 
ered. Our work differs from previous studies in several aspects. Prior results concentrate on the problem of 
how to design the quantization schemes for estimating a deterministic parameter where each sensor makes 
one noisy observation. The observations are usually assumed independent across sensors, and then discuss 
the relationship between MLE performance and the number of sensors. Here, we focus on the problem of 
how to design estimation schemes for the unknown parameter vector corresponding to the joint pdf of the 
observations where the number of sensors infixed. These observations may be dependent across sensors. The 
unknown parameters may include different vector parameters corresponding to marginal pdfs and parameters 
that describe dependence of observations across sensors. Actually, the dependence between sensors is very 
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important in multisensor fusion systems, for example, see the recent work on distributed Neyman-Pearson 
detection fusion, hypothesis testing using heterogeneous data, distributed location estimation with dependent 
sensor observations l25l l26l l27l . We also derive the relationship between MLE performance, number of 
quantization bits and number of observations. It is worth noting that our work neither requires the knowledge 
of observation models, assumptions of Gaussianity and independence of noises across sensors nor requires 
scalar quantizers and scalar estimated parameters. 

In this paper, we first determine the regularity conditions which should be satisfied by the joint pdf 
and quantizers such that the MLE with quantized data is asymptotically efficient. Then, the relationship 
between the asymptotic variance of MLE and the number of quantization bits is analytically derived. We 
shall prove that the asymptotic variance of MLE with quantized data is monotone decreasing with the number 
of quantization bits and has a lower limit, which is equal to the asymptotic variance of MLE with raw 
measurements. When the number of quantization bits is given, a robust distributed MLE scheme is designed 
by employing J different quantizers. Its asymptotic efficiency is proved under some regularity conditions 
and the asymptotic variance is derived to be the inverse of a convex linear combination of Fisher information 
matrices based on J different quantizers. Thus, the robustness can be analytically verified. A numerical 
example with a bivariate Gaussian pdf with an unknown parameter vector is considered. Simulations show 
that the new MLE scheme is robust and much better than that based on the worst quantization scheme from 
among the groups of quantizers. Another interesting phenomenon is that the asymptotic variance of the 
estimates of parameters of marginal pdfs is almost independent of the dependence between the sensors. 

The rest of the paper is organized as follows. Problem formulation is given in Section |2] In Section |3j 
the performance analysis of MLE with quantized data is given. The robust MLE scheme is proposed and 
the asymptotic results are derived. In Section 01 numerical examples are given and discussed. In Section [51 
conclusions are made. 



2 Problem formulation 

The basic L-sensor distributed estimation system is considered (see Figured]). Each sensor has k i -dimensional 
observation population Y{, i = 1, . . . , L. Suppose that the joint observation population Y = (Y{, . . . , Y[)' 
has a given family of joint pdf: 

{p(yi,...,yL|0)}eeecR* (1) 

where ' denotes the transpose and 9 is the unknown A;-dimensional deterministic parameter vector which may 
include marginal parameters and dependence parameters. Here, we do not assume independence across sen- 
sors, knowledge of measurement models and Gaussianity of joint pdf. Let N independently and identically 
distributed (i.i.d.) sensor observation samples and joint observation samples be 

% = (Yi 1 ,...,Y iN ),i = l,...,L; (2) 
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Y = (¥{,..., YD'. (3) 
Suppose the sensors and the fusion center wish to jointly estimate the unknown parameter vector 6 based on 
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Figure 1 : Distributed MLE fusion system with quantized data 

the spatially distributed observations. If there is sufficient communication bandwidth and power, the fusion 
center can obtain asymptotically efficient estimates with the complete observation samples based on MLE 
procedure under some regularity conditions on the joint pdf. 

In many practical situations, however, to reduce the communication requirement from sensors to the 
fusion center due to limited communication bandwidth and power, the i-th sensor quantizes the observation 
vector to bits (r^ > 1) by r% measurable indicator quantization functions: 

I}(yi) ■ Vi e R ki -> {0, 1}; . . . ; : y t G R k > ->• {0, 1}, (4) 

for i = 1, . . . , L. Here, each quantizer l\{yi) may have one or multiple thresholds; and its 0/1 quantization 
region may be a continuous region or union of discontinuous regions. Moreover, we denote by 

L 

I{y\r) ^ (h( yi y, iUyL)')' eR r ,r = Y, n, (5) 

i=i 

where 

I i (y i )±(lhy i ),...,I r i i (yi)y,i = l,...,L, (6) 
and r is the total number of bits available to transmit observations from the sensors to the fusion center. 
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Once the r^-bit binary quantized samples Ii(Yi n ),n = 1, . . . , N are generated at sensor i, i = 1, . . . , L, 
they are transmitted to the fusion center. The fusion center is then required to estimate the true parameter 
vector 9* with the quantized data. Usually, the MLE is used to estimate the parameter vector by maximizing 
the log likelihood function 

l(9\Y) = log P (I (Y\r)\9) 

N 

= log l[P(h(Y ln ),... ,I L (Y Ln )\9) 

n=l 

N 

= ^log P(h(Y ln ),...,I L (Y Ln )\9) 

n=l 
N 

= y^ lo g / p{yi,---,yL\0)dyi...dy L . (7) 

n=1 J{h(yi)=h(Y ln ),...,l L (y L )=l L (Y Ln )} 

In the framework of such a distributed multisensor fusion system, one may question whether this method can 
estimate the parameters efficiently, since raw measurements in may be compressed to as low as 1-bit Ii(yi), 
i = 1 , . . . , L so that a lot of information can be lost. Indeed, there are examples where MLE with quantized 
data cannot estimate parameters well, which will be given in Remark [331 On the other hand, there are also 
examples where MLE method with quantized data yields good estimation results (see, e.g., Il25ll27l ). Thus, 
in the present paper, we shall concentrate on analytically investigating the following basic problems: 

• What conditions should the pdf p(yi, . . . ,Vl\G) and the quantizers I(y\r) satisfy to guarantee that 
MLE with quantized data will be an asymptotically efficient estimator? 

• What is the relationship between the asymptotic variance of MLE and the total number of quantization 
bits r? 

• How should I(y\r) be designed to derive a robust and asymptotically efficient MLE for a given number 
of bits r? 

3 Performance Analysis of Maximum Likelihood Estimation with Quantized 
Data 

3.1 Asymptotic efficiency of maximum likelihood estimation with quantized data 

By the definition of observation samples and quantizers, we define 

U 4 0[ 7 ...,U' L y, (8) 
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Ui 4 (U iU ...,U iN y, i = l,...,L, (9) 
C7 m 4 (UI,...,U^, n = l,...,N, (10) 

c& ^ /i(r ta ),...,tm = /[>(y m ). (ii) 



If we take [7 as the joint quantized observation samples and denote the quantized observation population by 
U = I(Y\r) = (Ji(Yi), . . ., Il{Yl))', we know that U has a discrete/categorical distribution. Based on the 
pdf of Y and quantizers I(y\r), the probability mass function (pmf) of the quantized observation population 
Uis 

fu(ui,u 2 ,. . . ,u L \6) = P ul ,u 2 ,...,u L for U = (ui,u 2 , . . . ,u L ), (12) 

where 

(m,U2, • • • ,«£) 6 5 M = {(«i,U2, • • • ,itz) G M r : 

L 

Ui is a rj -dimensional binary row vector, i = 1, . . . , L, r = fj}, (13) 

i=l 

P Ul ,u 2 ,...,u L = I p(yi,y 2 ,- ■ ■ ,VL\0)dy 1 dy 2 . . .dy L , (14) 

J ^■(u 1 ,u 2 ,...,u L ) 

S( U1)U2 ,..., U£ ) = {(z/i 5 2/2, • • • -h(yi) = ui,l 2 (y 2 ) = u 2 ,...,i L (yL) = u L }. (15) 

Note that fu{u\, u 2 , ■ ■ ■ , ul\9) is determined by 2/2, • • • , 2/z,|#) an(1 sensor quantizers ii(yi), . . ., 

Thus, the quantized observation population U has a family of joint pmf {fu(ui,u 2 , ■ ■ ■ , v>L\Q)}oe&c.R k 
which yields the following log likelihood function of samples U by ©, (fT2l-([T5T>: 

AT 



= log]] fu(U ln ,U 2n ,...,U Ln \0) (16) 

n=l 

iV 

= y" lo S / p(yi,---,yL\d)dy 1 ...dy L 

n=l J{h(yi)=U ln ,...,I L (y L )=U Ln } 
N 

= y] lo g / p(yx,---,yL\G)dyx...dy L (IV) 

n=1 ■/Ul(|tt)=/l(l r l»),...,/i(l/i)=/i(l r in)} 

= K^)- (18) 

Therefore, MLE with quantized data for the family {p(y\, . . . , 2/L|^)}eeecR fc i s equivalent to MLE for the 
family {fu(ui,u 2 , . . . , UL\0)}e^ecR k - Therefore, if MLE for the latter is an asymptotically efficient esti- 
mator, then MLE for the former is also asymptotically efficient. Thus, we only provide the conditions for 
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p(yi> 2/2, • • • , Vl\9) an( l sensor quantizers I\{y\), ■ ■ ., Il{vl) such that the classical regularity conditions for 
fu (u\ , U2 | 0) are satisfied, then asymptotic efficiency of MLE with quantized data can be guaranteed. 

Moreover, by ([TBI and (fT3l) . the log likelihood function is also equal to 

JV 

l(9\U) = ^2logf u (Ui n ,U 2n ,...,U Ln \9) 
n=i 

2 r 

= J2 n j l °Sfu(uj\9) (19) 
i=i 

where nj = #{(!7 ln , f/ 2 n, • • • , ^Ln) = % G S u , n = 1, . . . , N}, Ylf=i n j = N > #H is the cardinality of 
the set. The parameter vector 9 is estimated by maximizing the log likelihood function ( fT9l or equivalently 
solving the equation: 

^l(9\U) = 0. (20) 

Based on the classical asymptotic properties of MLE (see, e.g., ll28l 1291 ), we have the following two 
lemmas. 

Lemma 3.1. Assume that p(yi,y2, ■ ■ ■ , Ul\9) and sensor quantizers I\{yi), . . ., Il{ul) generate the quan- 
tized samples and fu(ui,U2, . . . ,ul\9) satisfies the regularity conditions: 

(CI) Ii(Yii), . . . , Ii(Y iN ) are i.i.d., i = 1, . . . , L; 
(C2) The model parameter is identifiable: 

9^9'^ it 2 , . . . ,ul\9) ^ fu(ui,u 2 , ■ ■ ■ ,ul\9'); (21) 

(C3) fu{u±,U2, • • • , ul\9) is differentiable with respect to parameter vector 9; 

(C4) The parameter space contains an open set of which the true parameter 9* is an interior point. 

Then, Equation f !20D has a solution denoted by 9 which is a consistent estimator of 9*, i.e., 9 —> 9*, a.s. 

The next three conditions and (C1)-(C4) are sufficient to guarantee asymptotic normality and efficiency 
of MLE. 

Lemma 3.2. Assume that p(y±, y2, ■ ■ ■ , Vl\9) and sensor quantizers Ii(yi), . . ., Il(vl) generate the quan- 
tized samples and fu{ui, U2, ... ,Ul\0) satisfies the regularity conditions ( CI )-( C4) and 

(C5) For every (u\, . . . ,ul) £ S, fu( u i, u 2, ■ ■ ■ , u l\9) is three times differentiable with respect to pa- 
rameter vector 9, and J fu( u i,U2, • • • , ui\9)dui . . .dui can be differentiated three times under the 
integral sign; 
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(C6) For any 6* in the parameter space, there exists a positive number e and a function M{u\ , . . . , u£) such 
that 



Q3 



dOiOjOt 



log fu{u\, u 2 ,...,u L \6) 



< M(ui, . . . , UL)for all (ui, . . . ,ul) G S, 



(22) 



6 £ e neighborhood of 6*, with E e * [M(Ui, ... , Ul)} < oo; 
(C7) The Fisher information matrix exists and is nonsingular: 

l(6*,I(-\r))=E e 



01og(/cKZ7i, U 2 , . . . ,U L \0)) dlogjfu jUx, U 2 ,..., U L \9)Y 

X 



06 



06 



y 0. (23) 



Then, 



N(6 - 6*) — ► iY(0,X" 1 (r,/(-|r))) 



(24) 



where X (0*, I(-\r)) is the Cramer-Rao lower bound for one quantized sample which depends on the form 
and number of bits of the quantizer I{y\r). That is, 6 is a consistent and asymptotically efficient estimator of 



Remark 3.3. Note that the identifiability of fu(ui,U2, ■ ■ ■ , ul\6) implies that p(yi,y2, • • • , Vl\6) is identifi- 
able. If not, there are 6 ^ 6' such that p(yi,y 2 , . . . , Vl\9) = p{yi,U2, ■ ■ ■ , Ul\6') so that, by (fT2l > and (fT4l . 
filial, u 2 , ... ,ul\0) = fu{ui,u 2 , . . . ,ul\9') which yields a contradiction. On the contrary, the identifia- 
bility p(yi,y 2 , . . . : yi\6) does not imply that fu{ui,u 2 , . . . , ui\6) is identifiable. For example, let 



(Yi,y 2 )~iv 



( 







" 1 ' 


) 
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6 1 





< 1 



and 



h{yi) = urn) = m 



1, ifx>0, 
0, if x < 0. 

We know that the normal pdf p(yi,y 2 \9) is identifiable. However, by (fT2l . the pdf of U = {I\(Yi) , I 2 (Y 2 )) 
is 

fu(ui,u 2 \6) = 1/4, for all \6\ < 1, 

which is not identifiable. In this case, we would not be able to distinguish between two parameters even 
with an infinite amount of data by MLE. For the above example, however, it is easy to make fu{u\, u 2 \0) 
identifiable by choosing unsymmetric Ii{y\) and I 2 {y 2 ). 
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3.2 MLE with multiple bit quantized data and Cramer-Rao lower bound 



In this subsection, the relationship between Cramer-Rao lower bound (i.e., asymptotic variance of MLE with 
quantized data) and the number of bits for quantization is analytically determined. We first derive a useful 
lemma. 

Lemma 3.4. Ifb = [bi, b^, ■ ■ ■ , b^]', c = [c\, C2, . . . , c^]', a = b+c and g = d+e, d, e are positive constants, 
then 



-^ + - (25) 

g d e 



where the equality is obtained at \ 



Proof. Since (cd — be)(cd — be)' is a non-negative definite matrix, we have 
{cd-be){cd-be)' h 
44> cc'd 2 + bb'e 2 h cb'de + bc'de 

«4> cc'd 2 + bb'e 2 + bb'de + cc'de >z cb'de + bc'de + bb'efe + cc'de 

<^ (bb'e + cc'd) (d + e) h (b + c) (b + c)'de 

aa' bb' cc' 

^ 3 — ~i — I • 

d + e d e 

Theorem 3.5. If we denote the Fisher information matrix with one raw measurement by 1(9), then the 
relationship between the number of bits and the Cramer-Rao lower bound or the asymptotic variance of 
MLE with quantized data satisfy 

X- x {6* , I{-\r)) y X- X [Q\ I(-\r + 1)), (26) 

and 

lim Z~ 1 (9\ I(-\r)) = Tiey 1 , (27) 

where r — > oo means n — > oo, i = 1, . . . , L. The inequality in terms of Fisher information Matrices can be 
obtained from (126\) by taking the inverses and flipping the direction of inequality. 
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Proof. We denote the r-bit and (r + l)-bit quantized observation population by U r and U r+ % respectively. 
The pmfs of U r and U r+ \ are defined in a similar manner by (fT2l respectively. Here, they are denoted by 

fu r ( u ii ■ ■ ■ , u r\0) an d fu r+1 (ui, . . . , u r , u r+ i\9) respectively, where U{ = 0/1, i = 1, . . . ,r + 1. Thus, we 
have 



fu r {ui,.. .,U r \9) = fu r+1 (ui,.. .,U r ,0|6>) + fu r+1 (ui, ••• ,U r , l\0). 



(28) 



Moreover, 



d d d 

-K-fu r {ui,. ■ .,U r \6) = -K-fu r+1 {Ul, . . .,U r ,O|0) + — f Ur+1 [Ul , . . . ,U r , 1\9), 
Of), Of), Of), 



(29) 



for i = 1, . . . , k, where k is the dimension of parameter vector 9. 

The Fisher information matrices for U r and U r+ \ can be derived respectively as follows. 



l(9*,I(-\r)) 



E 



fuM) 



f Ur (Ur 



fuM) 



fu r {Ur) 
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&«l,...,Ur ' a ui,...,U r 



Ui=0/l,i=l,...,r 



§iii,...,ii r 
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<%< ... ~ [ Qj^fuAui, . . .,u r \9) 



8tti,...,ur = fu r (ui,...,u r \9); 







(30) 



(31) 
(32) 



X(0*,J(.|r + l)) 



E 



80 



-fu r +i(U r+1 ) ffl^fUr+i(U r+ l)\ / ^fu r+ i(U r +l) ^ fu r+1 (U r+ l) 



fu r +i(Ur+l) fu r+ i(Ur+l) 



fu r +i(Ur+l) f Ur+1 {U T+ i) 



^ ^Ui,...,U r ,0 • ^ui,...,Ur,0 _j_ C Wl,— ,Ur,l ' C m,...,M r ,l 



Ui=0/l,i=l,...,r 



&Ui,...,u r ,C 



£ui,...,u r ,l 



where 



(33) 



d d V 

-QQ-Ju r +l{ U U ■■-,U r , 0\9), -QQ-fUr+l 0"l> • • • , «r> O|0) I , (34) 
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c ui ,...,u r ,i = [j^fur+i(ui, ■ ■ ■ ,u r ,l\9), . . . , —fu^U!, . . . ,u r ,l\6) j , (35) 

dui,...,«r,o - fu r+1 (ui, ■ ■ ■ ,u r ,0\9), e Uu ... tUrtl = fu r+1 (ui, ...,u r , 1\6). (36) 
By Lemma [3~4l and Equations d28l)-06l), we have 

l(9\I{-\r))<T(9\I(-\r + V>). 

Thus, 

X-\9*,I(-\r)) h X-\9*,I(-\r + l)). 

Furthermore, quantization using r bits means that the measurement space is divided into 2 r regions. 
When r is large enough, for notational simplicity, we consider R 2 to be divided into 2 r regions: Ai = 
(-00, -M r ]xR, A 2 = (-M r ,+oo]xii, A 3 = (-M r ,M r ]x (-00, -M r ], A 4 = (-M r , M r ]x(M r , +00}, 
and the square (— M r , M r ] x (— M r , M r ] is further divided into 2 r — 4 small regions A^, i = 5, 6, . . . , 2 r 
where M r is large enough. By the definition of 1(9*, I(-\r)) (l23l . the mean value theorem for integration 
and existence of Z~ l (9), we have 



\mxX{9*,I{-\r)) = lim ^ 1 (A / p ( y |0)d y , . . . , A f p(y\9)dy)' 
r ^°° r ^°° ~ J At p{y\0)dy d9i J A . o9 k J Ax 



■ { Lik pmiv L,-k M) 

• ( A p ( y i|0) Ai) . . . , Ap( y ^)A,) + 4o(e r ) (y 4 G A,, e r 0) 
.(Ap( 2/ ^),...,Ap( 2/ ^))A i +4 ( er ) 

' %( y |0) j ...,A p ( y |0))' 



■( 

Z(0) 



5 5 
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For high dimensional case M. m , m > 2, the proof is similar. Therefore, we have the theorem. 
3.3 Robust maximum likelihood estimation with quantized data 

When the number of bits rj for each sensor is given, a natural question is how should quantizers I(-\r) 
be designed such that the asymptotic variance I^ 1 (9* , I (-\r)) of MLE with quantized data is as small as 
possible. The true parameter 9* , however, is not known, i.e. the pdf is not known. To the best of our 
knowledge, most of the existing work on optimal quantizer depends on the pdf or signal models. When 
both of them are not known, the optimal quantizer design cannot be derived in general. Actually, in the 
framework of this paper, we do not assume the Gaussian pdf and knowledge of measurement models. Thus, 
it is impossible to derive the optimal quantizer. However, we can choose multiple groups of quantizers to 
fend off against the risk of a poor quantizer. Therefore, we have following robust MLE design scheme. 

For notation simplicity, let = 1, i = 1, . . . , L and r = L 

1. Choose J groups of different quantizers (see Remark 13/71 for a discussion on the choice of J groups of 
quantizers) 

I<%) 4 (l[ j \ yi ), I$\y L ))' e R L , j = 1, . . . , J, (37) 

where 

l\ j \ yi ) : Vi el fc M{0,l}, i = l,...,L. (38) 

2. Observe Nj joint observation samples {{Yi nj , . . . , Yi nj )}^ =1 which are quantized by the j-th group 
of quantizers for j = 1 . . . , J. We denote by N = J2j=i Nj. The quantized observation samples 
{(l[ j \Y lnj ), I [ i\Y Ln .))}% =1 are denoted by {(£/£>, . . . , E^.)}^ =1 . Moreover, we denote 
by = {{U^., . . . , U^ )} n 3 i . The corresponding quantized observation population based on 
Jw) (y) is denoted by £/w) whose pmf is 

$\ Ul , U2 ,...,u L \9),j = l,...,J, (39) 

which can be similarly obtained by (fT2l) . 

3. Estimate the parameter 6 with the J groups of quantized samples by maximizing the log likelihood 
function: 

J Ni 

i(9\u^,...,u^) = io g nn#(<'<'---'<i ) ^ 

j=l n 3 =l 
J ^ 

= EE^(« 

j=l n 3 =\ 
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J Nj 

EE 10 ? / n P{yiT--,VL\0)dyi...dy L 



A 



(41) 

J 

^ l (0\U U) ), (42) 
i=i 

where l(9\U^) is the log likelihood function of the j-th group of quantized data U^K Equivalently, 
we solve the equation: 

^l(9\m i \...,U^) = 0, (43) 

whose solution is called robust MLE and denoted by Or. 

Obviously, the N quantized samples are not identically distributed due to the J different quantizers. One 
may question whether the new estimator based on the different quantizers is still asymptotically efficient? 
What is the asymptotic variance of the new estimator? Why is it robust compared to using one group of 
quantizers? Actually, these questions can be analytically answered by the following Theorem. 

Theorem 3.6. There are J groups of different sensor quantizers I^\y) defined by (13 '7D , j = 1, . . . , J. 
Assume that p(y\,y%, ■ ■ ■ ,Vl\6) an d quantizers I^\y) generate the quantized samples and the quantized 
pmf frp (u\,U2, . . . ,ul\6) defined by ( \39\) satisfies the regularity conditions (C1)-(C7). Then, 



VN(6 R - 0*) — ► N{Q,l-\e*- /«(•), • • • ,I (J) (-)) (44) 
where N = Y?j=i N ji N j — >■ oo, j = 1, . . . , J, 

x-^v (1) a---,J (J) (-)) = ^E^ x ( r ; j0) (-))j (45) 

= N^N^-I^i-))^ . (46) 

X^f=i NjZ(6*; j( J ) (-))^ is the Cramer-Rao lower bound for N quantized samples, where 1(9*; 1^ (•)) 
is the Fisher information matrix for one quantized sample ofU^K That is, 9r is a consistent and asymptoti- 
cally efficient estimator of 6*. 

Moreover, without loss of generality, we assume that the first group of quantizers corresponds to the worst 
asymptotic variance 

x-\e*- /«(•)) y T'He*;i^(-))j = 2, . . . , j, (47) 
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then 



l-\9*- /«(•), . . • , I (J) (-)) =< ; 
77ja? ij, jj a robust estimator of 6*. 



(48) 



Proof. From Lemma |3~T1 the regularity of p(yi,D2, ■ ■ ■ ,Vl\Q) an d quantizers I^\y) ensures that the 
quantized samples and the corresponding pmf fjj\ui,U2, ... ,ul\0) defined by d39l satisfy the regularity 
conditions (C 1)-(C4), it is easy to prove that Equation (l43l has a solution denoted by Or which is a consistent 
estimator of 0* , i.e., Or — )• 0*, a.s.. The proof is similar to that of Lemma l3~T1 (see. e.g., ll28l[29l ). However, 
N quantized samples are independent but not identically distributed due to the use of different quantizers. 
Thus, to prove the asymptotic normality, we will use Lyapunov central limit theorem by checking the Lya- 
punov condition (see, e.g., [29]). Simultaneously, the Cramer- Wold device (see, e.g., ||29l ) will be used to 
deal with the high dimensional estimated parameters. 

First, we expand the first derivative of the log likelihood function d40l around the true value 0*, 



di{o\ &o-),...,uW) di(o\uw,...,uW) 



09 



06 



+ 



2 1{0\U^,...,U^) 



oo 2 



(0-0*) + 



(49) 



where higher-order terms can be ignored under regularity conditions (see e.g. |[28l ). Substitute Or for and 
by (|43T ). we have 







Thus, 



di(e\uP>, ... ,f/ (J) ) 



09 



oi(o\u^,...,i7^) 



Or 



00 



+ 



O 2 l(0\U^,...,U^) 



N{0 



l_ O 2 l(0\U^,...,U^) 
N 



oo 2 



oo 2 



1 Ol(0\U^\ . . . ,U^) 



1R 



r). 



00 



(50) 



Then, we check the Lyapunov condition. Denote by S% = E/=i Njl(9*, jO')), (S T N ) 2 = Ylj=i N j 
t'X{9* ,I^>)t for an arbitrary r, and 



fH\Ui n .,U 2n .,...,U Ln .\6) 



(51) 



which exists, since condition (C3) is satisfied and U^> is a categorical distribution. Moreover, by condition 
(C5) and gD, 



J *j 



j = l flj = l 



r'^ogf^(U ln .,U 2n .,...,U Lni \e) 
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e[t'— log fjp(u lnj , u 2nj u Ln .\o)} 



J Ni 



1 

= lim > > E 

< lim ——r V V 



< lim — _iVmax{A^i,...,A^j} 

iV-S-oo (S^ 



t'— log (C/ ln . , U 2nj , . . . , U Ln . \6)-0 



< lim 



Nmax{Mi,...,Mj} 



N-+00 (Nmm{r'l(9*,l( l ))T, t'1(6*, J( j ))t}) a 
1 max{Ali, . . . , Mj} 



< lim 



0. 



iV->oo //v min{r'I(0*, jWjr, . . . , ^1(6**, J( J ))r} 
That is, the Lyapunov condition is satisfied. Thus, by Lyapunov central limit theorem (see, e.g., ||29l ), for all 



t, 



i , aim u^,...,u^) 

T 



N 



00 



n(o, {STn)2 - 



Moreover, by the Cramer- Wold device (see, e.g., [29]), we have 



1 01{6\U^\...,U^) 



00 

By application of the weak law of large number, we have 

1 2 l(9\U^) 



N(0. 



N 



o2 

N ' 



(52) 



Nj oe 2 



l(6*;I^(-)),j = l,...,J, 



(53) 



where l(6\U^') is defined in 

By Slustkey Theorem and d53l ), we have 

i o 2 i{e\u^\ . . . ,u^) 



N 



d6 2 



1 2 l(9\U^) 



3=1 



N N H oe 2 



N, 



E^v (j) (-)) 



o2 

' N ' 



(54) 



Based on (T50]> (T52J, (|54]) and Slustkey Theorem, we have 
where I" 1 ^*; (•), • • • , I (J) (-)) defined by g5). 



./ (J) (-)) 



(55) 
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Furthermore, by gSJ (07]) and the condition (C7) (l23l) . 




) 



i 




.•? =1 

3=1 



) 



-1 



Therefore, we have the theorem. 

Remark 3.7. In the first step of the robust MLE scheme, how do we choose J different quantizers? 

1. If some prior information that may be due to previous experience or feedback information that can 
be obtained such as the fact that thresholds should be in a bounded region, the J groups of different 
quantizers can be uniformly chosen in that region. If there is no prior information available, then some 
real-time training samples are required. A basic heuristic criterion is that do not let all samples to be 
in the region or the 1 region. For example, for i-th sensor, we can choose J different quantizers by 
observing N s i training samples. First, find the p-quantild3 and (1— p)-quantile of N s i training samples, 
and then uniformly choose J different thresholds between p-quantile and (1 — p)-quantile (assume that 
p < 1 — p). If N s i is large, one can choose p close to ^, and vice versa. 

2. How to determine J? Actually, there is a tradeoff between robustness, optimality and memory require- 
ments. Theorem 13.61 shows that the asymptotic variance is the inverse of a convex linear combination 
of Fisher information matrices based on J different groups of quantizers. Thus, a larger J means more 
robustness compared with the worst case. However, it may also mean worse performance, since the 
best quantizer can be averaged by other worse ones. In addition, note that the above procedure requires 
the N s i training samples to be stored at the z-th sensor. Thus, N s i cannot be too large for sensors with 
limited memory so that the median of the samples may be randomly biased with respect to the median 
of the population. If N s i is large, one can choose one quantizer (J = 1) with one threshold — the 
median number, it may be the "best" quantizer. However, if N s i is small which means a possibly large 
bias so that it may result in a very poor performance. Thus, it is necessary to choose a larger J. Based 
on experiments that we have run, J ranging between 2 and 10 is likely to yield good results. 

'Here, p-quantile (0 < p < 1) is point such that the cumulative probability of samples will be less than p. Thus, i-quantile is 



median. 
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4 Numerical Examples 



Let us consider a two-sensor Gaussian example: (Yi,Y 2 ) ~ N(pi, p, 2 , a\, cr|, p) with joint pdf as follows 

1 



p(yi,V2\o) 



2-7TY 7 (1 - p 2 )a 1 a 2 
( -1 



• exp 



2(1 -p 



2\ 



y\-^i\ 2 2p ^ yi - pi ^ ^ y2 - P2 ^ | ^ y2-^2 ^j 2 



where the parameter vector to be estimated 9 = [8q, 6\, 9 2 ] = [p, pi, P2\- We will assume p\ = 5, p 2 = 7, 
o\ = 6, o 2 = 8 and obtain results for several different values of p = [0.2,0.3,0.4,0.5, 0.6,0.7,0.8]. We 
will compare the robust MLE with MLE based on a single quantizer. Assume that the prior information is 
that thresholds are in [0, 15]. We uniformly choose the following four groups of different quantizers. 



i^(y) = 


(4V) 


,4 1} (y 2 )) 


= (i[yi 


-15],/[y 2 


-15]) 


i (2) (y) = 


(i?{yi) 


,4 2) (y 2 )) 


= (i[yi 


-10],/[y 2 


-10]) 


i (3) (y) = 


(i?\yi) 


,4 3) M) 


= (i[yi 


-5],I[y2- 


5]), 


i^{y) = 




,4 4) M) 


= (i[yi 


-0},I[y 2 - 


0]), 



where 



I[x 



1, if x > c, 
0, if x < c. 



For robust MLE, we let Ni = N 2 = N3 = N4 = N/ 4 where N is number of samples for MLE with 
fixed quantizer (y),j = 1, 2, 3, 4 respectively. 

In Figs|2]431 theoretical CRLBs with different number of measurements are plotted for parameters 6 = 
[0.5,5,7] respectively. The number of samples N = [800,900,1000,1100,1200] are considered respec- 
tively. The corresponding MSEs based on 2000 Monte Carlo runs are also plotted in Figs [5]-[7] respec- 
tively. In Fig. [8j theoretical CRLBs with different values of p and N = 1200 are plotted for parameters 
6 = [0.2,0.3,0.4,0.5,0.6,0.7,0.8], 6 1 = 5 and 6 2 = 7 respectively. The corresponding MSEs based on 
2000 Monte Carlo runs are also plotted for 9q, 9\, 6 2 in Fig. [9] respectively. 

From Figs|2]-[9j we have the following observations: 

1. Both theoretical CRLBs and MSEs based on 2000 Monte Carlo runs for robust MLE are much smaller 
than those of the MLE based on the single quantizer that is worst in the group. This phenomenon is 
consistent with the analytical result in Theorem 13.61 Robust MLE is a kind of conservative estimate, 
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but it can avoid large errors in the worst case. The advantage of robustness (MSE of the worst MLE 
minus MSE of Robust MLE) is much larger than the loss due to conservative estimation to enhance 
robustness (MSE of Robust MLE minus MSE of the best one), especially in Figs [2j 015] and [6] 

2. From Figs. [8]-[9l an interesting phenomenon is observed in that the theoretical CRLBs of marginal pa- 
rameters are almost equal for different correlation parameter values between the two sensors. It means 
that the estimation of marginal parameters is not related to the correlation of sensors for the numer- 
ical example. Thus, when one only concentrates on estimating a marginal parameter, the correlation 
between sensors need not be considered. 

3. From Figs. [8]-[9j both theoretical CRLBs and MSEs based on 2000 Monte Carlo runs of 9q are less 
than those of 9\ and #2- The reason may be that the value of 9q is less than those of 9\ and 0%. 

5 Conclusion 

In this paper, an approach for robust distributed MLE with quantized data has been proposed under the as- 
sumption that the structure of the joint pdf is known, but it contains unknown deterministic parameters. First, 
we discussed regularity conditions which should be satisfied by the pdf and quantizers such that the MLE 
with quantized data is asymptotically efficient. Then, we analytically derived that the asymptotic variance of 
MLE with quantized data is monotone decreasing with the number of quantization bits and has a lower limit, 
which is equal to the asymptotic variance of MLE with raw measurements. When the number of quantiza- 
tion bits is given, a robust distributed MLE scheme was designed by employing J different quantizers. Its 
asymptotic efficiency was proved under some regularity conditions and the asymptotic variance was derived 
to be the inverse of a convex linear combination of Fisher information matrices based on J different quan- 
tizers. Thus, the robustness was analytically shown. A numerical example with a joint Gaussian pdf was 
considered. Simulations show that the new MLE scheme is robust and much better than that based on the 
worst quantization scheme from among the groups of quantizers. Another interesting phenomenon is that the 
asymptotic variance of marginal parameters is almost not related to the correlation between two sensors. 

The future work will involve the application of Robust MLE with quantized data to distributed location 
estimation, distributed detection fusion and hypothesis testing using heterogeneous data. 
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Figure 2: Theoretical CRLBs of MLE of 6q while using raw measurements, different quantizers and the 
robust MLE of 6q for different number of measurements 
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Figure 3: Theoretical CRLBs of MLE of 6\ while using raw measurements, different quantizers and the 
robust MLE of 9\ for different number of measurements. 
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Figure 4: Theoretical CRLBs of MLE of 62 while using raw measurements, different quantizers and the 
robust MLE of 62 for different number of measurements. 
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Figure 5: MSEs of MLE of 9q based on 2000 Monte Carlo runs while using raw measurements, different 
quantizers and the robust MLE of 8q for different number of measurements 
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Figure 6: MSEs of MLE of 9\ based on 2000 Monte Carlo runs while using raw measurements, different 
quantizers and the robust MLE of 0\ for different number of measurements. 
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Figure 7: MSEs of MLE of 62 based on 2000 Monte Carlo runs while using raw measurements, different 
quantizers and the robust MLE of 62 for different number of measurements. 
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Figure 8: Theoretical CRLBs of robust MLE of 9q, 9\ and 62 for different correlation parameter ps. 
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Figure 9: MSEs of robust MLE of 9q, 9\ and 6*2 based on 2000 Monte Carlo runs for different correlation 
parameter ps. 
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